ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / icon / newsgrp / group93c.txt / 000022_icon-group-sender _Wed Jul 21 09:13:07 1993.msg < prev next >

Wrap

Internet Message Format | 1994-02-02 | 3KB

Received: by cheltenham.cs.arizona.edu; Wed, 21 Jul 1993 08:44:20 MST Date: Wed, 21 Jul 93 09:13:07 CDT From: "Richard L. Goerwitz" <goer@midway.uchicago.edu> Message-Id: <9307211413.AA01333@midway.uchicago.edu> To: icon-group@cs.arizona.edu Subject: Icon for large-scale stuff Status: R Errors-To: icon-group-errors@cs.arizona.edu >Some time ago I re-posted a message from the Linguist List having to do >with a linguistic software initiative. I had wondered if (and suggested >that) Icon might become more 'popular' by showing utility in such an >initiative. >One reply claimed that Icon would not have such utility since it could not >manage the massive amounts of data/calculations that are often required in >linguistic work. Sounds pretty pessimistic. In fact, I've been using Icon successfully on large corpora for several years now. Naturally there are some things Icon does not do well. Despite the natural tendency to be lazy, one really does have to maintain facility with several languages, both high and low-level, in order to be construct NLP tools quickly that can do the job. One thing to remember is that NL stuff often involves constructing human- machine interfaces. The software only has to be fast enough to map NL queries to some primitive set of instructions. This can be done in real time using Icon-based tools. NLP also may involve processing large cor- pora in batch mode. Again, although Icon will not do this sort of thing as quickly as C, it's certainly no worse than LISP or Prolog, and these are two of the main languages used for such batch processing. The idea is that it doesn't matter if the batch processing finishes in one minute or ten if it's really done in batch mode. Response time is a problem more for interactive systems. One final note I might inject here is that Icon performs perfectly well for multi-megabyte databases, especially under "real" operating systems with sensible file systems. If you want evidence of this, ftp my silly "Bibleref" program from cs.arizona.edu. This program can find a passage requested by the user, decompress it, and display it in just a few sec- onds. It can perform word searches almost as quickly (unless you are looking for something like "the"). If I'd skipped the compression and decompression phases by using a straight, human-readable database, then the delays would have been even less. I confess that there are certain projects I really wouldn't do in Icon. I discovered one seemingly complex phenomenon in one text to be, in fact, *almost* right linear, and decided that the natural parsing tool to use would be YACC + Lex. Still, it is flatly wrong to claim that Icon would not be useful as a linguistic research tool. The proof is in the pudding, and I have a lot of pudding on hand. -Richard Goerwitz goer@midway.uchicago.edu